Conversational speech recognition using acoustic and articulatory input

نویسندگان

  • Katrin Kirchhoff
  • Gernot A. Fink
  • Gerhard Sagerer
چکیده

The combination of multiple speech recognizers based on different signal representations is increasingly attracting interest in the speech community. In previous work we presented a hybrid speech recognition system based on the combination of acoustic and articulatory information which achieved significant word error rate reductions under highly noisy conditions on a small-vocabulary numbers recognition task. In this study we extend this approach to large-vocabulary conversational speech recognition using the Gaussian mixture acoustic modeling paradigm. We demonstrate that the articulatory input representation we propose contains information which is complementary to that provided by standard MFCC features, and that their combination can significantly reduce the word error rate on conversational speech. Various combination strategies (feature-level, state-level and word-level combination) are compared and evaluated.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining acoustic and articulatory feature information for robust speech recognition

The idea of using articulatory representations for automatic speech recognition (ASR) continues to attract much attention in the speech community. Representations which are grouped under the label ‘‘articulatory’’ include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements or classification scores for pseudo-articul...

متن کامل

Articulatory information and Multiview Features for Large Vocabulary Continuous Speech Recognition

This paper explores the use of multi-view features and their discriminative transforms in a convolutional deep neural network (CNN) architecture for a continuous large vocabulary speech recognition task. Mel-filterbank energies and perceptually motivated forced damped oscillator coefficient (DOC) features are used after feature-space maximum-likelihood linear regression (fMLLR) transforms, whic...

متن کامل

Tonal articulatory feature for Mandarin and its application to conversational LVCSR

This paper presents our recent work on the development of a tonal Articulatory Feature (AF) for Mandarin and its application to conversational LVCSR. Motivated by the theory of Mandarin phonology, eight features for classifying the acoustic units and one feature for classifying the tone are investigated and constructed in the paper, and the AF-based tandem approach is used to improve speech rec...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Using multiple acoustic feature sets for speech recognition

In this paper, the use of multiple acoustic feature sets for speech recognition is investigated. The combination of both auditory as well as articulatory motivated features is considered. In addition to a voicing feature, we introduce a recently developed articulatory motivated feature, the spectrum derivative feature. Features are combined both directly using linear discriminant analysis (LDA)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000